Picture for Shentong Mo

Shentong Mo

Improving Visual Representation Alignment Generation with GRPO

Add code
May 30, 2026
Viaarxiv icon

Chain of Uncertain Rewards with Large Language Models for Reinforcement Learning

Add code
Apr 15, 2026
Viaarxiv icon

LVRPO: Language-Visual Alignment with GRPO for Multimodal Understanding and Generation

Add code
Mar 29, 2026
Viaarxiv icon

Foley-Flow: Coordinated Video-to-Audio Generation with Masked Audio-Visual Alignment and Dynamic Conditional Flows

Add code
Mar 09, 2026
Viaarxiv icon

pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation

Add code
Feb 26, 2026
Viaarxiv icon

GMAIL: Generative Modality Alignment for generated Image Learning

Add code
Feb 17, 2026
Viaarxiv icon

SaDiT: Efficient Protein Backbone Design via Latent Structural Tokenization and Diffusion Transformers

Add code
Feb 06, 2026
Viaarxiv icon

GMS-CAVP: Improving Audio-Video Correspondence with Multi-Scale Contrastive and Generative Pretraining

Add code
Jan 27, 2026
Viaarxiv icon

Scaling Up Audio-Synchronized Visual Animation: An Efficient Training Paradigm

Add code
Aug 05, 2025
Viaarxiv icon

DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap

Add code
Mar 15, 2025
Figure 1 for DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap
Figure 2 for DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap
Figure 3 for DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap
Figure 4 for DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap
Viaarxiv icon